PRIVACY AND THE DIGITAL ARCHIVE: 
OUTLINING KEY ISSUES 


Marc Rotenberg, director 
Electronic Privacy Information Center 
Washington, DC 
www.epic.org 


DOCUMENTING THE DIGITAL AGE 
San Francisco 
February 1997 


At first glance, a privacy advocate among digital archivists is a none too popular person. 
Privacy concerns often create obstacles to access, restrict the collection of useful information, or 
raise public concerns about the merits of the underlying endeavor. A recent article on problems 
with the IRS record-keeping went so far as to to blame privacy protection for the failure to 
construct a information management system that could process the nation’s tax returns. 


It is not the goal of this paper to argue that privacy protection does not at times impose 
costs. Instead, the purpose is to provide an overview of privacy as a legal and sociological 
concept, to describe several of the privacy issues that arise in the digital environment, to identify 
some of the critical issues, and also to encourage new thinking about the relationship of privacy 
and efforts to promote access to public information. 


Privacy will be an issue in the development of an Internet archive. An examination of key 
issues will help clarify underlying problems and may help resolve some of the privacy concerns. 


PAST AS PROLOGUE 


In 1987 Judge Robert Bork was nominated to be an associate justice for the United States 
Supreme Court. The confirmation hearing was hotly contested. Supporters of Judge Bork 
pointed to his distinguished professional career as a law school professor, solicitor general, and 
appellate judge. Opponents questioned his views on such issues as civil rights, abortion, and anti- 
trust. Judge Bork’s scholarly articles were debated. His judicial opinions were dissected. The 
nominee of a Republican President was questioned at length by the Democratic chair of the 
committee. 


Then in the midst of the confirmation battle, a reporter obtained a list of the video rental 
records for the Bork family from a local video store. The reporter published an article in the City 
Paper on the private viewing preferences of the Supreme Court nominee. Suddenly the public 
debate on the nomination turned from the judge’s scholarship to his choice of John Wayne and 
James Bond movies. Eventually, and for other reasons, Judge Bork was not confirmed. A few 
months after the debate concluded, Congress enacted a law, the Video Privacy Protection Act, to 
limit the disclosure of video rental records. 


While the disclosure of Judge Bork’s video viewing habits played little role in the ultimate 
decision of the Judiciary committee, the incident and the subsequent debate about the release of 
video records provided the country a concrete example of privacy issues in the digital age. 


Imagine a judicial hearing in the year 2007. The nominee comes before the Judiciary 
Committee. Should every transaction that links her name to a record on the web be available for 


public scrutiny? 


WHAT IS PRIVACY? 


More than a century, Louis Brandeis described privacy as the “right to be let alone.” It 
was a sweeping phrase that is still today often cited by courts and commentators. It is also an 
incomplete definition. As a legal concept, Dean Prosser said that there were in fact four distinct 
privacy interests -- a right to protect the disclosure of private facts, a right to prevent intrusion 
into seclusion, a right to prevent the presentation of one’s self in a false light, and a right to 
prevent commercial appropriation of personality. Scholars have said that privacy is also a right 
of autonomy, of dignity, of freedom in personal decision-making, and of association. 


There is also the privacy right derived from the Fourth Amendment which limits the 
power of the state to conduct searches and intrude into private life. It is the legal basis for the 
requirement that officers of the state typically must obtain a warrant before they can enter your 
home or seize your papers or possessions. 


These are general descriptions of privacy. In practice, privacy is often found in law as 
specific limitations on the use of personal information. Many privacy laws today are codified in 
statute, such as the Fair Credit Reporting Act or the Video Privacy Privacy Protection Act 
described above. Courts also continue to interpret the Constitutional restrictions on government 
searches and the common law concept of privacy set out by Brandeis. Privacy laws are therefore 
a combination of rules enacted by legislatures and rulings of judges. 

As a general matter, privacy laws have several common attributes: 

¢ Privacy is fundamentally a claim of individuals 


¢ Privacy concerns personally identifiable information 


* Privacy issues arise in the collection, use, maintenance or disclosure of personally 
identifiable information 


* Privacy claims are given greatest weight in relations of trust 
¢ Privacy issues are universal 


The Internet raises the prospect that new privacy issues will arise. Far more information 
on individuals can be collected. Inferences about individual behavior could go much further. 


Key issue: Are currently understood concepts of privacy adequate to understand the 
issues that a digital archive would raise? 


WHY PRIVACY MATTERS 


The protection of privacy is an article of faith among Internet users. This can be seen in 
the political activity of net activists on such as issues as Clipper, Digital Telephony, and P-Trak. 
It is reflected in the ongoing effort of organizations to develop privacy policies. It is also clear 
from public polling data, including a recent poll from Lou Harris, that the general public is 
concerned about privacy. Most notably, the recent GVU WWW survey also found privacy 
among the top concerns of Internet users around the globe. [http://www.epic.org/privacy/] 


In general, users of the Internet are more concerned about privacy than the general public. 
They value anonymity, oppose the sale of personal information, and favor techniques for 


anonymous payment. They are reluctant to disclose personal information if they do not know 
how it will be used. 


Perhaps this is not surprising. The Internet itself has promoted a wide array of 
anonymous and pseudo-anonymous activity -- the use of aliases, anonymous FTP, digital 
payment systems, anonymous surfing. The disclosure of personal information on the net is often 


a negotiation based on a person’s assessment of the benefits. Some users turn away from web 
sites that require registration. 


Is there an empirical basis for the concern about privacy? The answer is yes. At one end 
of the spectrum are the intrusions resulting from unwanted email, a minor convenience. At the 
other end are moments in history when governments have misused records on citizens. The 
United States used records of the census bureau to identify population tracts of Japanese- 
American families so that they could be interned during the Second World War. Nazi Germany 
made use of administrative records in municipal offices and telephone company records to 
identify Jewish families throughout Europe. Across the spectrum are public concerns about the 


loss of employment, insurance, or medical care, resulting from the improper disclosure or misuse 
of personal information. 


Key Issue: Are public concerns about privacy issues likely to increase or decrease 
in the years ahead? What would be the consequences of either outcome for an 
effort to archive the web? 


PRIVACY AS AN INSTRUMENTAL VALUE 


Privacy is often described as human right or a political right. Privacy also plays an 
important role in information environments. Communications and information retrieval are two of 


the best examples. 


When Benjamin Franklin set out plans for the US postal system, he knew that privacy 
would be important. One of the first laws governing the operation of the US mails made it a crime 
to improperly disclose private correspondence. Two centuries later, when Congress considered 
the scope of the federal wiretap law in new digital environments, it recognized a need to extend 


privacy protection to electronic mail. Communications from one person to another often contain 
the most personal thoughts. 


Privacy also plays an important role in the realm of information retrieval, particularly for 
sensitive or controversial subjects. Web services and telephone hotlines that provide counseling 
on AIDS, gay and lesbian issues, suicide and depression routinely provide techniques for 


anonymous communication. Anonymity is necessary to encourage potential users to obtain the 
information they are seeking. 


More generally, libraries protect the privacy of borrower records. Many libraries also 
routinely destroy borrower records when library materials are returned. Such policies were 


developed to encourage users of libraries to explore unpopular topics without fear of public 
recrimination. 


A clear analogy arises in the online with user logs. Many web operators are reluctant to 
disclose the content of these logs to others, and some even have a policy of destroying logs. Still, 


web operators can use aggregate data to track site usage and even to obtain advertisers without 
compromising the privacy of users. 


Key issue: Should private communications be routinely archived? Should acts of 
information retrieval be routinely archived? 


PUBLIC LIFE AND PRIVATE LIFE 


Privacy can also be understood as the interaction of two distinct social environments -- 
the public realm and the private realm. The German philosopher Jurgen Habermas described a 
dynamic relationship between the private sphere and a public sphere. Social life and political 
activity occurred in the public sphere. Private life is distinct and occurred in a separate sphere. 
Individuals moved between the two spheres at different times for different reasons. In a similar 
light, Erving Goffman, the Berkeley sociologist, observed that in moving from a private realm to a 
public realm, individuals formed bonds of trust. These bonds of trust were based on the 
disclosure of aspects of personality that might occur in one relationship but not another. A 
person might share a secret with a colleague but not with a stranger. 


The web provides an excellent example of the operation of a public sphere and a private 
sphere. Web pages are a part of the public sphere and reflect a wide range of social, commercial, 
and political activity. The private sphere includes email and intranets, communications that are 
personal and organizational relations that are closed. Arguably, the crossing point between the 
private sphere and the public sphere can be found in web usage logs. It could well be the case that 
the freedom to move between a private sphere and a public sphere on the web will be determined 
by the privacy of these records. 


Key issue: Does the creation of an Internet archive threaten the preservation of a 
private sphere on the Internet? 


PRIVATE LIFE AND SURVEILLANCE 


Intrusions into private life can be justified in some circumstances. A doctor conducts a 
physical examination of a patient to ensure an accurate diagnosis. An employer inquires into 
previous employment of a potential employee to determine if a the person is suitable for a 
particular job. A bank collects credit information on a person requesting a loan to decide if the 
person will be able to repay a loan. Law enforcement officials search a person’s home or place of 
business if there is probable cause to believe that a crime has been committed. 


But routine intrusions into private life are more problematic. Few people would consent 
to ongoing video surveillance, even in a public place, or the routine disclosure of personal 
financial records to others. 


Consider how Justice Brandeis viewed the emergence of the first collection of electronic 
information in the law enforcement context, what I like to call the first cyberspace opinion. The 
question before the Supreme Court in 1928 was whether the practice of intercepting telephone 
communications should be subject to the probable cause requirements of the Fourth Amendment. 
The government argued that since there was no physical intrusion into the defendant’s home as 


there was no physical search. 


Brandeis was not persuaded. He compared the interception of a telephone communication 
with the interception of a letter sent by the US postal service. Brandeis noted that this type of 
search was not limited in space or time. In a physical search, for example, the officer had a 
warrant for a specific bit of evidence. The search occurred at a fixed point in time, and there was 
also notice to the target. Wiretapping was ongoing and could capture the communications of 


innocent parties. 


Modern law on wiretapping generally reflects Brandeis’s concern by limiting the duration, 
purpose, and scope of electronic surveillance. 


Looking at the problem of surveillance from a different perspective, Jeremy Bentham, the 
nineteenth century utilitarian philosopher, designed plans for the perfect prison which he called 
the panoptican. In such a prison, it would be possible to constantly monitor a prisoner. Bentham 
speculated that in such an environment simply the belief that one was being watched would be 
enough to coerce behavior. 


Surveillance, as a routine intrusions in private life, is one of the common themes of many 
dystopias such as 1984. There is rarely a private sphere, or it is enjoyed by only a small elite. 


o Key Issue: Is it possible to develop archiving practices that do not involve ongoing 
surveillance? 
PRIVATE LIFE IS PART OF HISTORY 
Having said that privacy is a fundamental concern Internet, it would nonetheless be 
foolish not to recognize that private life is also very much a part of the historical record, and that 
one could not properly understood a culture or a period in time without some insight into private 
activities. A few examples of popular collections 


* Walker Evans photographs of sharecroppers during the depression 


¢ Edwards Steichen’s collection The Family of Man in the 1950s which helped the world 
see all the beauty and diversity of people, places, cultures, and traditions 


* Robert Coles interviews with young children 


Some mechanisms should be developed to capture “snapshots” of private life on he 
Internet -- particular emails messages, even particular web logs. 


Key issue: How should information about private life be collected and how should it be 
presented in a web archive? 

THE PROBLEM OF BALANCE 
When faced with competing policy interests, it is often tempting to invoke a call for 


balance. Apart from its rhetorical value, “balance” grants both claims some legitimacy and avoids 
the hard work of making difficult choices. 


But balance is often not the best way to think about privacy issues. This is not because 
that there are not competing interests. Rather it is the case policy, like the technology it mirrors, 
is highly dynamic. The range of policy choices that exist next year to reconcile competing 
privacy and public access claims may be very different from the choices available today. 


Let me borrow a device from the economists to help make this point. Economists tell us 
that given two things we value, carrots and cabbage for example, we can often plot our trade off 
of one good for the other on an indifference curve. In the policy realm, we might also make 
choices between two policy goals -- privacy protection and public access. Looking at a fixed 
range of policy options (PO), we might choose PO* which gives us a lot of public access at the 
expense of privacy protection. Or we might choose PO” which provides a high level of privacy 
protection but at a sacrifice in public access. 


(Figure 1) 


Some problems in information policy involve making choices between PO’ and PO? which 
necessarily pit privacy interests access issues of public access. But in a surprising number of 
cases, better policies are found when it is possible to move the indifference curve to a position 
that increases both privacy protection and public access. 


(Figure 2) 
What are examples of “moving the curve” to a better range of policy options? 


* In 1974, the United States Congress passed the Privacy Act and expanded the scope of 
the Freedom of Information Act. In this way, the privacy of personal records held by the 
federal government was protected, while public access to public records held by the 
government was enhanced. 


+ In a recent case, the Supreme Court ruled that the publication of anonymous pamphlets 
was protected by the First Amendment. In so holding, the Court both expanded the 
privacy rights of authors to hide their identity and promoted the free flow of information 
by limiting the ability of state governments to restrict the publication of information. 


+ In libraries, individuals are typically free to obtain a wide range of materials without any 
recording of their interests. These practices encourage access and protect privacy. 


Conversely, it is also possible to imagine a world where both privacy protection and 
public access is limited. Prisons, totalitarian governments -- and notably the world of Brother 
Francis -- are societies where public access to information is limited, but so too is privacy 
protection. There is no private sphere in these worlds. And the absent of a secure private sphere 


7 


is reflected the absence of a robust public spheres. 


Key issue: How do we construct a policy for an Internet archive that increases both 
public access and privacy protection? 


INFORMATION POLICY AND OPEN SOCIETIES 


Open societies and democratic societies tend to formalize the protection of public sphere 
as well as the private sphere, to move the curve outward to PO,,. Such societies establish 
procedures that assure the preservation of government records, the privacy of individual records, 
and access to public information. A quick survey of such laws in the United States include 


* Privacy Act (private records held by government agencies) 

¢ Freedom of Information Act (public access to government records) 
* Records Preservation Acts 

* Federal Advisory Committee Act (open meetings) 

¢ Depository Library Program 


Key issues: How would such laws apply to an Internet archive? Will an Internet archive 
reflect a similar commitment to protecting the public and private spheres? 


CHALLENGE OF ANONYMITY 


Many of the current efforts to protect privacy on the Internet are based on promoting 
anonymity and pseudo-anonymity. A new German ISP law requires the availability of 
anonymous payment mechanisms for commercial providers offering goods and services over the 
Internet. In response to public concern that cookies could be used to transfer personally 
identifiable information, companies such as PGP Inc. have developed a program called “Cookie 
Cutter”. Other efforts include the development of digital cash and the destruction of usage logs. 


Key issue: Will archivists respect the efforts to promote systems and techniques for 
anonymity or will there be pressure to personally identify information or preserve 
information that might otherwise be destroyed? 


One area where the goals of archivists may face trouble with traditional privacy rules 
concerns the obligation of organizations that collect personal information to protect the privacy 
interests of data subjects. Theses rules are commonly referred to as a Code of Fair Information 
Practices and could be simple codes of conduct or formal rules of law. 


A Code of Fair Information practice typically restricts the ability of an information 
collector to disclosure personal data to others. A code also grants a right to data subject to 
inspect and correct personal information. A code may even include an obligation to destroy 
information about individuals once a certain period of time has passed. 


Key Issue: Should an Internet archive be subject to any form of a Code of Fair 
Information Practices? 


EPIC AS EXAMPLE 


The Electronic Privacy Information Center follows an information policy that 
protects privacy and promotes public access. We make every effort to protect the 
privacy of our users -- we destroy our logs, we do not disclose our records to others, we 
are trying to support systems for anonymous payment, and we make a wide range of 
privacy enhancing tools available at our web site. 


But EPIC is also very much interested in preserving and publishing public 
information of interests to the general public. One of our critical goals is to make available 
policy documents obtained from the government regarding the development of 
cryptography policy. To accomplish this task we have pursued Freedom of Information 
Act litigation, scanned images of documents, and archived records at our web site. 


Some of these documents could be crucial for understanding critical policy choices 
made in the early stages of the development of the Internet. Consider for example these 


items: 


¢ A memo from Brent Scowcroft to George Bush describing Digital Telephony as a 
“beachhead” for Clipper 


* A memo from the FBI to the National Security Counsel indicating that Clipper 
will only work if made mandatory 


Key issue: Could policies that protect privacy and promote public access be adopted by 
an Internet archive? 


THE CASE FOR OPENNESS 


The creation of an Internet archive raises a series of privacy issues, but it is best that 
these issues are debated publicly and that privacy concerns are addressed. What if a large private 


9 


corporation or a large government agency undertook a similar effort to collect everything on the 
net, but without any opportunity for public discussion or any public awareness of the collection 
activity? 


In beginning a discussion on the privacy implications of an Internet archive, an important 
step has been taken toward the development of policies consistent with open societies. 
Successful resolution of these issues could help avoid the establishment of archives that reflect 
the values of closed societies -- little public access, little privacy protection. 


Key issue: Assuming that it is possible to develop privacy appropriate standards for an 
Internet Archive, should steps be taken against other entities that attempt a similar effort 
without appropriate safeguards. 


GOALS 


From the privacy perspective,the central goal for documenting the digital age should be to 
build a public archive that respects private life. This means recognizing that there are aspects of 
private life that should not be recorded. It also means ensuring that the content of the archive 
should be made widely available and not held by a small elite. 


Such an archive would not routinely record private activity on the web, but would do so 
selectively. Boundaries on disclosure would be based on promoting techniques to preserve 
anonymity, limitations on collection and disclosure, and recognizing that private spheres exist on 
the net and should be preserved. 


How do we know where private life begins? One answer may be provided by other 
experiences with past technologies for recording our history. It could be said that privacy begins 
when the tape recorder is turned off, when the camera is put down, when the pen rests by the 
paper. That is where private life begins. If we record everything on the web, that could also be 
where privacy ends. 


10 


PRIVACY AND THE DIGITAL ARCHIVE: OUTLINING KEY ISSUES 


Public Access Public Access 


: (PO?+ PO") 
Policy Options 


Figure 1 Figure 2 
Privacy Protection and Public Access 


Privacy Protection and Public Access 
As Search for Optimal Policy 


As Trade-Off 
PO’ A public access policy option PO, Policy Options - Initial 


PO’ A privacy protection policy option PO,, Policy Options - Open Society 
PO,, Policy Options - Closed Socity 


Documenting the Digital Age 1997 Marc Rotenberg 


